Skip to content

BUG: DataFrame repr with ArrowDtype with extension #54130

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 18, 2023

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke commented Jul 14, 2023

I think it makes sense for the python .type of a pyarrow ExtensionType to be its storage's .type.

@mroeschke mroeschke added Output-Formatting __repr__ of pandas objects, to_string DataFrame DataFrame data structure Arrow pyarrow functionality labels Jul 14, 2023
@mroeschke mroeschke added this to the 2.1 milestone Jul 14, 2023
dtype=ArrowDtype(ArrowPeriodType("D")),
)
result = repr(df)
expected = " col\n0 15340\n1 15341\n2 15342"
Copy link
Member

@jbrockmendel jbrockmendel Jul 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where does "15340" come from? i'd expect something like "2012-01-01"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is because ArrowPeriodType stores period data as pa.int64. Since pyarrow.ExtensionType is only used for serialization it appears it doesn't go through period reprs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im unclear on if this should be considered "wrong"? or just orthogonal to what this PR is doing? seems like this would be an unhelpful repr to give a user

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a orthogonal incorrect behavior. This PR addresses a bug where pyarrow.ExtensionTypes didn't even show a repr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a comment that this test shouldn't be interpreted as saying this repr is "correct"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Added a comment

else:
raise NotImplementedError(pa_type)
elif isinstance(pa_type, pa.ExtensionType):
return type(self)(pa_type.storage_type).type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this going to match the storage_type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the python type of the storage_type

@jbrockmendel
Copy link
Member

Might also close #54063?

@mroeschke mroeschke merged commit 811610d into pandas-dev:main Jul 18, 2023
@mroeschke mroeschke deleted the bug/pyarrow_ea_repr branch July 18, 2023 16:50
@mroeschke
Copy link
Member Author

Might also close #54063?

I'll leave that open for now. I interpreted that issue as maybe there's a fix that could be backported to 2.0.4 (but I am not sure that there is)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality DataFrame DataFrame data structure Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: repr of DataFrame fails when using ArrowDtype with arrow extension type
2 participants